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1 Introduction: aiming for flexibility, precision and efficiency 



The process of fitting and validating dynamic models against epidemic and outbreak data 
is often labor and computationally intensive. The net result of this is that scientists and 
public health agencies often use suboptimal methods to explain outbreak and epidemic 
data. Nevertheless, there has been intense scientific activity around what is called plug- 
and-play methods: flexible algorithms that can be applied with hardly any effort to new 
models, by just plugging a short model-specific bit of code into them. 

This effort has lead to a rich diversity of algorithms that are ready to be used. Some 
are very quick to run, because they rely on some clever mathematical or numerical ap- 
proximations, and some others offer the possibility to make no approximation whatsoever 
(aside from the model itself, of course), but they can quickly become prohibitively com- 
plex and computationally expensive to run. The goal of PLoM is to take the best out of 
this variety of methods, making it easier and more feasible to use the most precise and 
demanding algorithms, while keeping everything plug-and-play. 

In addition, by taking advantage of code templating approaches and symbolic compu- 
tation libraries, PLoM brings an extra layer of simplicity: you won't have to code a single 
line to plug your model and start playing with the PLoM library. 

I. Model ' 

Describe the system: compartments, atomic reactions, 
parameters, diffusing parameters, etc... 

S 4 

! 2. Mathematical formulation 

Markov Stochastic Ordinary j 

state space or differential or differential | 

model? equations? equations? | 



Theoretically define a likelihood function p(y\0) 



3. Numerical computation 


Practically estimate p(y\0) for 


given values of 6 


Exact estimation, 




Deterministic 


with Monte Carlo error 


or 


approximation 


4.lnference 






Maximize p(y\0) 


or 


Explore p(y\0)p(0) 


(Frequentist approach) 




(Bayesian approach) 



/ \ 

5. Explore and analise 



What do the data and the model tell us about 6? 

Figure 1: Layered representation of a classic workflow in epidemiology. PLoM lets you concen- 
trate on steps 1 and 5: model, explore, and analyse. 

Figure [I] illustrates the classic workflow of mathematical modellers. The plug-and- 
play paradigm aims at allowing you to concentrate on the first and fifth layers: modelling, 
exploration, and analysis. This document will hopefully give all you need to understand 
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about the three intermediary layers in order to take the best out of the library of methods 
provided by PLoM. 



The first intermediary layer is the mathematical formulation of the model. PLoM 
integrates the principal approaches that can be found in the literature to translate com- 
partmental epidemic models into mathematical terms. These formulations lead to dif- 
ferent ways of computing the implied likelihood, which is the second intermediary layer. 
Deterministic models can simply be integrated using Euler schemes, nonlinear stochastic 
differential equations can be integrated (with a certain level of approximation) with tools 
as the Extended Kalman Filter (EKF), and in some cases no tractable exact or approx- 
imated scheme is available. This is the case of some more general stochastic Markovian 
state space models that can only be sampled from (i.e. simulated, one scenario at a time). 
In such cases, the likelihood is estimated by propagating hundreds or thousands of sce- 
narios, or particles, through the model. Lastly, the third intermediary layer deals with 
extracting information from the data (noted y) regarding the underlying parameters of 
the model (contained in a vector noted 9). This can either be done by maximising the 
likelihood p(y\0), which is the frequentist way of doing inference, or by going Bayesian 
and exploring the posterior density p(0\y)ocp(y\9)p(6) that includes the a priori beliefs 
regarding 0, reflected by p(9). 

As we just mentioned, there is a family of methods to estimate p(y\0) that work for all 
Markovian state space models, which naturally includes ODE's and SDE's. These methods 
are called particle filters (see Doucet and Johansen |2011| for more details). Basically, 



they can be used with any Markovian compartmental model, whatever its mathematical 
formulation. Not only are these filters very generic, but they also provide an unbiased (also 
called exact) estimate of the likelihood. Nevertheless, this estimate has some variability, 
termed Monte Carlo error, that can be brought to by having the number of particles tend 
to infinity. In practice, hundreds or thousands of particles are generally needed to obtain 
acceptable levels of variability, which makes particle filters computationally expensive, as 



well as their associated inference algorithms: iterated filtering (MIF, see Ionides et al. 



20U\ ) and particle MCMC (pMCMC, see |Andrieu ~eTdI] [2010j ). What we propose is 



to make these methods more accessible by using faster (but approximate) algorithms to 
initialise them. Indeed, initialisation phases can be critically long, both for optimization 
problems or when running an MCMC: it is just practical sense to let faster methods do 
their part of the work! 

An overview of the algorithms implemented in PLoM is presented in Table [T] In the 
next section, we will illustrate on a simple example how the lhs, ks implex and kmcmc 
algorithms can be used to literally shorten the calibration phase of an adaptive particle 
MCMC. 
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Goal 


Inference 
algorithm 


Numerical 
resolution 


Mathematical 
formulations 


Cost (order of magnitude) 
per iteration 


PLoM 
command 


Maximize 

p(y\0) 
(or p(6\y)) 


Iterated 
Filtering 


Particle filter 1 


Markov models 


0(1000 x k x n) 


mif 


Simplex 


ODE integration 2 


ODE 


0(d 2 x k x n) 


simplex 


SDE integration 2 
(with EKF) 


SDE 


0{d 2 xk 2 xn) 


ksimplex 


Explore 

p(o\y) 


pMCMC 


Particle Filter 1 


Markov models 


0(1000 xkxn) 


pmcmc 


EK-MCMC 


SDE integration 2 
(with EFK) 


SDE 


Oik 2 x n) 


kmcmc 



Table 1: Algorithms implemented in PLoM, with corresponding characteristics, 
d: dimension of 0, k: dimension of the system, n: number of observations. 
1: exact estimation of the likelihood, with Monte Carlo error 
2: deterministic approximation of the likelihood 
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2 A simple illustration case: efficiently initialising a particle 
MCMC 



In this section, we will illustrate how the lhs, ks implex and kmcmc tools implemented in 
PLoM can be used to efficiently initialise a particle MCMC. We will work on a simple SI 
model, that can be found in the tutorials of the PLoM library. We incorporate environ- 
mental stochasticity by having the efficient reproduction number Rq follow a diffusion, to 
capture its changes in time. We have generated data for two cities, in which the contact 
patterns (hence Rq) may be different. The problem we will deal with is the estimation of 
the initial reproduction numbers in city 1 and 2, respectively Rq^o) and i?o(£o), and the 
length v of the infectious period, that can be considered to be the same in both cities. 
Observations are shown in Figure [2j they consist of grouped incidence over cities 1 and 2, 
monitored by different sources (CDC and Google FluTrend for example), incidence in city 
2, and prevalence in city 1. The process, contect and link files for this specific problem 
can be found in the examples library as drift, so that you can reproduce and play with 
the experiments presented here. 




Sep Nov Jan Mar May Jul Sep Nov Jan Mar May Jul 

Figure 2: Simulated epidemiological data, reproducing the complexity of real datasets: both city- 
specific and global observations, different data streams, combination of prevalence and incidence 
observations, and missing data. 

One way to obtain information regarding i?o(£o) , i?o(to) and v is to explore the 
density p(9\y), with 9 being {Rl(to), i?o(to), v}. As this cannot be done directly, it is 
common practice to use a Monte Carlo Markov Chain. Concretely, It is a process that 
travels around the 9 space (R 3 here). Even more concretely, suppose that at iteration i 
the chain is in 9{\ a new value #* is sampled from a multivariate gaussian density centered 
in 9i, which covariance will be noted If p(0*\y)p(0*), that can be estimated in different 
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ways as explained in Section 1, is higher or not too much lower than p(0i\y)p(9i) (there 
is a formula behind this), #* will be accepted and become 9i+\. If #* is not accepted, 
the chain does not move and 0i+\ is equal to Q{. The theory says that when this chain 
converges, the sequence of samples it has generated can be treated as samples from p(0\y) 
(see Andrieu et al. |2003| for more details). However, it is important to note that if a chain 



that has converged is run for 1000 iterations, for example, the output is not equivalent to 
1000 independent and identically distributed samples from the target distribution. Again, 
there is a formula that gives you what is called the "Effective Sample Size" (ESS), the 
equivalent number of really independent samples you can consider to have obtained from 
your MCMC run. One last bit of theory: it has been shown that aiming for an acceptance 



rate around 23% is good practice, in order to maximise the ESS (Roberts et al. , 19971. 



Getting back to practical considerations, what is needed to start the chain is an initial 
position #o, and a covariance matrix So- In PLoM, if the pMCMC is launched from scratch, 
it will take the guess values as initial conditions, and the square of the sd_transf 's (for 
standard deviation in the transformed space) that have been entered manually as diagonal 
terms for So- There are two things to note at this point: first of all this makes So a diagonal 
matrix, which may be suboptimal if some components of 9 are correlated (we will illustrate 
this later). Then, it is hard to guess what are good values for the sd_transf 's as it would 
be optimal to set them close from the covariance of the posterior density, which is exactly 
what we don't know and are trying to figure out. We will now give an illustration of how 
things can go wrong or at least be inefficient if a pMCMC is poorly initialised. 



2.1 Examples of poor initialisations of the pMCMC 
2.1.1 Starting from an arbitrary 9o 

To begin with, we will consider that the covariance matrix Ho has been optimally chosen 
(case d in Figure [6]), and that initial conditions for i?o(£o) ? ^o(^o) and v have arbitrarily 
been set to 13, 13 and 16 respectively. The traceplot of one corresponding output is shown 
in Figure [3j it takes more than 2000 iterations to find the mode, albeit the model used 
in this example is fairly simple. Another possible scenario is that the chain gets stuck 
in a local maximum, as shown in Figure [4j It does not mean that the pMCMC does 
not work, it is a simple consequence of the fact that the pMCMC is not an optimisation 
algorithm: it should be launched close from the mode (or at least from the highest modes, 
in a multimodal density). We will show in the following subsection how such difficulties 
can be avoided. 



2.1.2 Using an arbitrary covariance matrix Sq 



We consider here the classic, non-adaptive version of the pMCMC algorithm: the same 
sampling covariance So is used all along the chain. If sd_transf 's are set, for example, 
to 0.02 for Rq^o) , Rq^o) and v, and used to define So as shown in case a of Figure [6j 
the algorithm performs very poorly. More than a million pMCMC iterations would be 
necessary to obtain the equivalent of 100 effectively independent samples. This can be 
translated into a measure of relative efficiency by comparing asymptotically the ESS of an 
MCMC with such to the ESS obtained with an optimal covariance Yi opt . An estimate 
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Figure 3: Traceplots and corresponding outputs for a wrongly initialized pMCMC: more than 
2000 iterations are lost to find the mode, albeit the model is fairly simple 




1000 2000 3000 4000 5000 
# iterations 




1000 2000 3000 4000 5000 
# iterations 



1000 2000 3000 4000 5000 
# iterations 



Figure 4: Traceplots and corresponding outputs for a wrongly initialized pMCMC: the chain 
stabilizes in a local mode where likelihood is below -2000 (the global mode is around -700). Once 
again, such problems can occur although the model is fairly simple. 



of the optimal covariance of reference is shown in Figure [6j case d\ it has been empirically 
obtained from efficient and long runs of pMCMC on this dataset. Under this criteria, the 
arbitrarily chosen Eg leads to a pMCMC that is 10 000 times less efficient than what could 
be optimally done! 



Relative Efficiency (E) = 



ESS(Z) 
ESS(T, op t. 



2.2 Improving the efficiency of the pMCMC by relying on the lhs, 
ksimplex and kmcmc algorithms 

2.2.1 Using lhs and ksimplex to determine 6q 



As indicated in Table [TJ there are three options in PLoM to find an optimum: the mif 
algorithm, the simplex algorithm, and the ksimplex algorithm. All three could be used, 
but ksimplex may be the wisest choice. The mif algorithm does not use any approxi- 
mation, but as it is based on a particle filter it may be computationally costly to run, 
while the goal here is only to find a satisfactory approximation of the mode. The simplex 
algorithm is the fastest of all three, as it only implies an ODE integration. However, it 
skips the stochastic components of the models, which can play an important role in models 
as the one we are looking at. Here, Rq follows a diffusion to model its variations in time: 



6 



considering a model without this diffusion makes it collapse to a model with constant i?o, 
which may lead to deceiving conclusions regarding the shape of the underlying likelihood 
function. Furthermore, 9 can sometimes contain parameters that are only involved in the 
stochastic components of the model, as the volatilities of diffusing parameters, or the am- 
plitude of environmental stochasticity. Running a deterministic simplex algorithm in such 
cases will only bring incomplete information. The ksimplex algorithm is both significantly 
faster than mif, and accounts for the stochastic components of the model. It can then 
be used to efficiently and reliably initialise a pMCMC algorithm. To avoid the risk of 
being stuck in a local maximum, the simplex algorithm should be associated with the lhs 
functionality of PLoM (for Latin Hyper-Square). It will randomly generate positions in 
the #-space, where an EKF will be run. The best position can be used as an initial con- 
dition for the optimisation algorithm simplex. The simplex generally converges quickly, 
and a pMCMC run from this position leads to the outputs shown in Figure [5j that can be 
favourably contrasted with Figures [3] and [4[ 




Figure 5: Traceplots and corresponding outputs for a pMCMC with a Oo adequately tuned with 
the lhs and ksimplex algorithm: no transient phase is observed, there are no pMCMC iterations 
used for tuning 



2.2.2 Initialising the Adaptive pMCMC with an EK-MCMC 
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Relative efficiency : 52% 
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Figure 6: Four experiments where an non- adaptive pMCMC has been run with different co- 
variance matrices Sq. With each experiment, the sampling efficiency relative to the optimal one 
achieved in case d is given. 



The poor results of case a in Figure [6] show how inefficient can be a pMCMC algorithm 
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based on arbitrarily chosen sd_transf 's. It is based on the non-adaptive form of the pM- 
CMC, intro duced in|Andrieu et al. |2010|. By simply incorporating the adaptive principles 
presented in Roberts and Rosenthal "[2009 1 , the covariance matrix Eo can progressively be 
replaced by more suitable matrices (E$) empirically computed from previous samples of 
the run. This adaptation can significantly improve the pMCMC efficiency. However, the 
"learning" phase can be long, specially when Eo is as inefficient as in case a of Figure [6| 
We propose to tackle this task with a faster algorithm. 



As shown in Table [TJ there is another algorithm implemented in PLoM to explore 
posterior densities: the EK-MCMC, that simply replaces the particle filter component of 
the pMCMC by an Extended Kalman Filter that quickly provides a good approximation 
of the likelihood. Case b of Figure [6] indicates that by initialising Eo with the empirical 
covariance matrix computed from the samples of an adaptive EK-MCMC (kmcmc) algo- 
rithm, the pMCMC will directly perform at 97% of its nominal efficiency. Once again, 
there will be no need for a burn-in phase of the adaptive pMCMC if it has been adequately 
initialised using faster algorithms. The resulting posterior distributions shown in Figure 
[7] can then be obtained at a strongly reduced computational cost. 

At last, case c shows what would have been the efficiency of the pMCMC if by chance 
the sd_transf 's had been set to their optimal values (Eo still being a diagonal matrix). 
The absence of the extra-diagonal terms, in such a situation where parameters are cor- 
related, significantly decreases the efficiency of the algorithm. By almost 50% in this 
example. 




n 1 1 1 

19.0 19.5 20.0 20.5 21.0 21.5 
r0/0 



11.0 11.5 
v/0 




772 -770 -768 -766 -764 -762 
log likelihood 



Figure 7 : Posterior densities estimated from the outputs of an efficiently initialised pMCMC. 
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3 More details? Questions and answers 



3.1 Similarly, can fast methods be used to reduce the computational 
cost of iterated filtering? 

By starting with a combination of the lhs and ksimplex methods, the iterated filtering 
algorithm should start close from the modes, making it converge much more quickly. 



3.2 What about parallelisation? 

All particle-based methods (smc, mif , pmcmc) are automatically parallelised by distributing 
the particles over all available cores. This will work on your own machine as well as on 
clusters. In case you want to control the number of cores the method uses, you use can 
simply use the -P command. Parallelisation can substantially reduce the computing time 
it takes to run a particle filter! 

Nevertheless, despite parallelisation the number of operations needed to run a particle 
filter stays the same: under this perspective the EKF still provides a substantial advantage 
in any case. Furthermore, there are cases where the covariance matrix of the EKF is sparse, 
which could make parallelisation profitable. This is the case, for example, if inference is 
done simultaneously on a variety of geographic patches in which epidemic dynamics are 
partly independent, as they only influence each others through migration. This is an open 
area for research and experimentation, that could lead to further parallelisation of the 
EKF and of the particle filters. 



3.3 What is the principle of the adaptive particle MCMC and its PLoM 
implementation? 



The adaptive pMCMC implemented in PLoM is inspired from the Adaptive Metrpolis 
algorithm introduced in Roberts and Rosenthal |2009| . It is based on the idea that as 



the chain progresses, the samples generated so far give a better and better idea of the 
structure of the posterior density, and should be used to tune the covariance matrix of the 
importance sampling distribution. 

At the beginning of the chain, there are too few samples to reliably obtain a good 
estimate of the posterior density covariance. Thus, the particle MCMC (pmcmc) and EK- 
MCMC (kmcmc) algorithms implemented in PLoM work with two phases: 



1. At iteration i, use the covariance = e?So, with ei defined in the following way: 

2.38 2 

6 ° dim{6) 
Q+i = Q x exp(a n (AccRate - 0.234))} 
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where a is the cooling factor (can be controlled with the -a option of the pmcmc 
and kmcmc algorithms), and AccRate is the proportion of proposed samples #* that 
have been accepted up to this iteration. The J^fl) ratio has been shown to be the 



optimal scaling in a gaussian case (Roberts et al., 1997|. 



2. After a certain number of samples have been accepted (controlled by the -S option), 

Yii is set to Jra(fl) x ^Emp(i:i)i where ^Emp(i:i) is the empirical covariance computed 
from the samples obtained up to iteration i. 



For more details, see Roberts and Rosenthal [2009| and Dureau et al. [2012] 



3.4 On practical examples, how much less computationally expensive is 
the EKF with regards to the particle filter? 



In Dureau et al. [2012] , two application examples of how the EK-MCMC can be used to 



initialise a pMCMC have been given. The first model is an SEIR model with a diffusing 
efficient contact rate flt and observed incidence, which leads to a state vector of five dimen- 
sions. Using a simple and direct implementation, the EKF is equivalent to propagating 
one vector of dimension 5(5 + l)/2 + 5 = 20 (5(5 + l)/2 for half of the covariance matrix, 
that is symetric, and 5 for the state vector). On the other hand, it has been determined 
that between 1000 and 2000 particles (vectors of dimension 5) are necessary to obtain 
satisfactory acceptance rates in the pMCMC. Hence, in this case between two and three 
orders of magnitude in terms of computational cost are gained when using the EKF. 

In the second example, a slightly more complex SEIR model with two age classes 
is used, with both the children-to-children and adults-to-adults efficient contact rates 
diffusing, and both group-specific incidences being recorded. This leads to a state vector 
of dimension 10, meaning EKF runs equivalent to propagating a vector of dimension 65, 
while 6000 where particles (of dimension 10) considered necessary to run an acceptable 
particle filter. Once again, two to three orders of magnitude are gained when using the 
EKF. 



3.5 Are there alternatives to the particle MCMC? 



The SMC 2 algorithm (Chopin et al. 2011 ) is an interesting alternative to the particle 



MCMC. The computational complexity appears to be equivalent, and the differences will 
lie in their implementation, how they can be parallelized on clusters, and how they can 
be combined with faster algorithms. Besides, an alternative to the adaptive pMCMC 
presented here would be a pMCMC where the MCMC component is no longer a random 
walk Metropolis algorithm, but rather an independent sampler efficiently defined from a 
preliminary EK-MCMC. This is just another example, there are globally many paths to 
explore! 
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3.6 EKF can have numerical issues, how does PLoM deal with them? 

Practicionners know that nonlinear versions of the Kalman filters can have numerical issues 
that affect their stability. What has been done in PLoM is to fully exploit the adaptive 
integration tool of the GNU Scientific Library. It automatically adapts the integration 
time step to the magnitude of the variations induced by the dynamics of an ordinary 
differential equation, providing controlled numerical precision and higher stability. We 
take advantage of this approach by vectorizing the deterministic equations followed by 
the covariance matrix in the EKF. By doing so, we turn the deterministic resolution of 
the SDE performed by the EKF into an ordinary differential equation, at least for the 
integration/prediction phases. This has improved the efficiency and stability of the EKF. 
Of course, there may be more difficulties on the way, and other/better approaches to tackle 
the problem, we naturally welcome any comment or suggestion! 
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